UN-3149 [FEAT] Support CSV, TXT, and Excel files in Prompt Studio#1783
UN-3149 [FEAT] Support CSV, TXT, and Excel files in Prompt Studio#1783hari-kuriakose merged 7 commits intomainfrom
Conversation
- Store original files in main directory, converted PDFs in converted/ subdir - CSV/TXT/Excel skip PDF conversion entirely and show only Raw View - Backend resolves converted PDF transparently for preview (fetch_contents_ide) - Frontend auto-switches to Raw View for non-PDF file types - Add upload_converted_for_ide() helper method - Include converted/ in directory creation and delete cleanup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughBackend adds converted-PDF preview storage and conversion-aware upload flows; frontend uses MIME type to choose preview vs. Raw View and introduces a direct-upload MIME whitelist for certain file types. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BackendAPI
participant Converter
participant FileStorage
participant Database
Client->>BackendAPI: upload_for_ide(file)
BackendAPI->>BackendAPI: detect MIME (magic), check converter & should_convert_to_pdf
alt convert to PDF
BackendAPI->>Converter: convert file -> PDF
Converter-->>BackendAPI: converted PDF
BackendAPI->>FileStorage: upload converted PDF to converted/{name}.pdf
BackendAPI->>FileStorage: store original file in main dir
else store original
BackendAPI->>FileStorage: store original file in main dir
end
BackendAPI->>Database: create document record (original filename)
BackendAPI-->>Client: upload response
Client->>BackendAPI: fetch_contents_ide(file, view=ORIGINAL)
alt converted PDF exists
BackendAPI->>FileStorage: fetch converted/{name}.pdf
FileStorage-->>BackendAPI: converted PDF data
else fetch original
BackendAPI->>FileStorage: fetch original file
FileStorage-->>BackendAPI: original data
end
BackendAPI-->>Client: file content response (includes mimeType)
Client->>Client: render by mimeType (PDF → PdfViewer, non-PDF → placeholder / Raw View)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx (1)
592-614:⚠️ Potential issue | 🟡 MinorStale error message: "Only PDF files are allowed" is no longer accurate.
When
ConfirmMultiDocis not available and the file type isn't inDIRECT_UPLOAD_TYPES, the error message still says "Only PDF files are allowed" (Line 607). Now that CSV, TXT, and Excel files are also accepted, this message is misleading.Proposed fix
if (!ConfirmMultiDoc) { setAlertDetails({ type: "error", - content: "Only PDF files are allowed", + content: "Only PDF, CSV, TXT, and Excel files are allowed", }); }backend/utils/file_storage/helpers/prompt_studio_file_helper.py (1)
178-201:⚠️ Potential issue | 🟠 MajorVerify that
text_content_stringis always defined before Line 201.If a file's MIME type is in
allowed_content_typesbut doesn't match any of the explicit branches (PDF, text/plain, text/csv, Excel), execution reaches theelseat Line 198, which only logs a warning. Control then falls to Line 201 wheretext_content_stringis referenced — but it was never assigned, causing anUnboundLocalError.This is a pre-existing issue, but the new branches make it more visible. Consider initializing
text_content_string = ""before the if/elif chain or adding a return/raise in theelsebranch.Proposed fix
def fetch_file_contents( org_id: str, user_id: str, tool_id: str, file_name: str, allowed_content_types: list[str], ) -> dict[str, Any]: """Method to fetch file contents from the remote location. The path is constructed in runtime based on the args """ + text_content_string: str = "" fs_instance = EnvHelper.get_storage(backend/prompt_studio/prompt_studio_core_v2/views.py (1)
576-605:⚠️ Potential issue | 🟡 MinorConversion failure will surface as an unhandled 500 error.
If
process_file(Line 581) raises an exception (e.g., corrupted file, unsupported format variant), there's no try/except around the conversion block, so the entire upload request fails with a generic 500. Consider wrapping the conversion in a try/except and either falling back to storing the original without a converted preview, or returning a descriptive error message.Proposed approach
if file_converter_service.should_convert_to_pdf(file_type): + try: converted_data, converted_name = file_converter_service.process_file( uploaded_file, file_name ) PromptStudioFileHelper.upload_converted_for_ide( org_id=UserSessionUtils.get_organization_id(request), user_id=custom_tool.created_by.user_id, tool_id=str(custom_tool.tool_id), file_name=converted_name, file_data=converted_data, ) + except Exception: + logger.warning( + f"Failed to convert {file_name} to PDF for preview, " + "storing original only" + ) # Reset uploaded_file for storing original in main dir uploaded_file.seek(0) file_data = uploaded_file
🤖 Fix all issues with AI agents
In `@backend/prompt_studio/prompt_studio_core_v2/views.py`:
- Line 577: The conversion branch uses the browser-supplied
uploaded_file.content_type (referenced as file_type at the if with
file_converter_plugin) instead of the server-detected MIME from FileValidator,
causing inconsistent behavior; fix by persisting the server-detected MIME (e.g.,
add a property or return value like detected_mime on FileValidator or its
validate() result) or re-run magic.from_buffer() in the view before the
conversion decision, then use that server-detected MIME (detected_mime) in the
conditional (the same place that checks file_converter_plugin and file_type) so
conversion vs direct upload is based on the validated content type.
🧹 Nitpick comments (3)
frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx (1)
593-600:DIRECT_UPLOAD_TYPESis re-created on everybeforeUploadcall inside theonloadcallback.This is a minor allocation concern. Consider hoisting it to module scope as a constant since it never changes.
Proposed refactor
Move it outside the component, near the top of the file:
const DIRECT_UPLOAD_TYPES = new Set([ "application/pdf", "text/plain", "text/csv", "application/vnd.ms-excel", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "application/vnd.ms-excel.sheet.macroenabled.12", ]);backend/utils/file_storage/helpers/prompt_studio_file_helper.py (1)
86-116:upload_converted_for_idelargely duplicatesupload_for_ide.The two methods share the same storage-init and path-resolution logic. Consider extracting a private helper that accepts a subdirectory parameter (e.g.,
Nonefor main dir,"converted"for the converted subdir) to reduce duplication.frontend/src/components/custom-tools/document-manager/DocumentManager.jsx (1)
357-370: Consider disabling the "PDF View" tab for non-PDF files instead of showing a placeholder.Users can still manually switch to the PDF View tab for non-PDF files and see a "preview not available" placeholder. Disabling or hiding the tab would be a cleaner UX, though the current approach is functional.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/src/components/custom-tools/document-manager/DocumentManager.jsx (1)
155-161:⚠️ Potential issue | 🟠 MajorBug:
fileDatais not reset when switching documents, causing stale mimeType.When
selectedDocchanges,fileUrlandextractTxtare reset butfileData(which holdsmimeType) is not. This means between the reset and the new response arriving (or if the new request fails), the previous document'smimeTypepersists. SinceisContentAvailable(line 482-485) andrenderDoc(line 358) both depend onfileData.mimeType, stale state can cause incorrect rendering — e.g., showing the non-PDF placeholder for what should be a PDF, or marking content as available when the fetch actually failed.Proposed fix
useEffect(() => { setFileUrl(""); setExtractTxt(""); + setFileData({}); Object.keys(viewTypes).forEach((item) => { handleFetchContent(viewTypes[item]); }); }, [selectedDoc]);Also reset in
handleFetchContent:const handleFetchContent = (viewType) => { if (viewType === viewTypes.original) { setFileUrl(""); setFileErrMsg(""); + setFileData({}); }
🤖 Fix all issues with AI agents
In `@frontend/src/components/custom-tools/document-manager/DocumentManager.jsx`:
- Around line 482-485: The isContentAvailable prop can remain as written but you
must prevent stale fileData from making it true after a failed fetch: update
handleGetDocsError to clear the previous fileData (e.g., reset fileData to
null/empty object) and/or set an explicit fetch-failed flag (e.g.,
fileFetchError) and include that in the isContentAvailable check; specifically,
modify handleGetDocsError to reset the state that holds the prior mimeType
(refer to fileData and handleGetDocsError) so that isContentAvailable (which
depends on fileUrl and fileData.mimeType) cannot be true after a fetch failure.
- Around line 259-261: The FileReader.onerror handler currently throws an
exception inside the async callback (reader.onerror) which becomes unhandled;
instead, update the component's error state inside that callback (e.g., call
your state setter such as setError or setFileError with a descriptive message
and the event/error details) and avoid throwing — locate reader.onerror in
DocumentManager.jsx and replace the throw new Error("Fail to load the file")
with a call that sets the component error state and optionally logs the event so
the UI can render the error.
🧹 Nitpick comments (1)
frontend/src/components/custom-tools/document-manager/DocumentManager.jsx (1)
356-370: UnuseddocNameparameter inrenderDoc.After the refactor to mimeType-based rendering,
docNameis no longer used insiderenderDoc. Consider removing it to avoid confusion.Proposed cleanup
- const renderDoc = (docName, fileUrl, highlightData) => { + const renderDoc = (fileUrl, highlightData) => {And update the call site at line 489:
- {renderDoc(selectedDoc?.document_name, blobFileUrl, highlightData)} + {renderDoc(blobFileUrl, highlightData)}
frontend/src/components/custom-tools/document-manager/DocumentManager.jsx
Show resolved
Hide resolved
frontend/src/components/custom-tools/document-manager/DocumentManager.jsx
Show resolved
Hide resolved
…txt-excel-prompt-studio
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx (2)
610-615:⚠️ Potential issue | 🟡 MinorUpdate error message to reflect all allowed file types.
The error message "Only PDF files are allowed" is now inaccurate since CSV, TXT, and Excel files can also be uploaded directly. This will confuse users who attempt to upload unsupported formats.
📝 Proposed fix
if (!ConfirmMultiDoc) { setAlertDetails({ type: "error", - content: "Only PDF files are allowed", + content: "Only PDF, CSV, TXT, and Excel files are allowed", }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx` around lines 610 - 615, The error message in ManageDocsModal.jsx currently set when ConfirmMultiDoc is false is inaccurate; update the setAlertDetails call (inside the ConfirmMultiDoc check) to use an accurate message listing the allowed types (e.g., "Only PDF, CSV, TXT, and Excel files are allowed") or a clear generic message like "Unsupported file type; allowed: PDF, CSV, TXT, Excel" so users see the correct allowed formats; ensure you update the string passed to setAlertDetails({ type: "error", content: ... }) where ConfirmMultiDoc is evaluated.
608-621:⚠️ Potential issue | 🔴 CriticalBackend serializer will reject CSV, TXT, and Excel uploads in OSS mode.
The frontend allows CSV, TXT, and Excel uploads via
DIRECT_UPLOAD_TYPES, but the backend'sFileUploadIdeSerializervalidates againstFileKey.FILE_UPLOAD_ALLOWED_EXTandFileKey.FILE_UPLOAD_ALLOWED_MIME. In OSS mode (without the file_converter plugin), these default to["pdf"]and["application/pdf"]respectively. Users will receive a serializer validation error when attempting these uploads, creating a mismatch between frontend allowlist and backend validation. Thefile_converterplugin gating only affects file conversion logic, not the initial upload validation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx` around lines 608 - 621, Frontend permits CSV/TXT/Excel via DIRECT_UPLOAD_TYPES but backend FileUploadIdeSerializer enforces FileKey.FILE_UPLOAD_ALLOWED_EXT/MIME (OSS defaults to pdf), causing validation failures; update the upload decision flow in ManageDocsModal.jsx to consult the backend-supported upload types or the file_converter availability before treating a file as "direct": either fetch allowed extensions/mimes from the server (or read a passed-in prop/feature flag like isFileConverterEnabled) and use that to compute DIRECT_UPLOAD_TYPES, or fall back to only allowing PDF as direct upload; adjust the branch around DIRECT_UPLOAD_TYPES, setFileToUpload, and resolve(file) so non-supported types show the ConfirmMultiDoc/error path instead of resolving immediately.
🧹 Nitpick comments (1)
frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx (1)
598-606: MoveDIRECT_UPLOAD_TYPESto module scope.The Set is recreated on every file upload since it's defined inside the
reader.onloadcallback. Define it once at the top of the file as a module-level constant for better clarity and minor efficiency.✨ Suggested refactor
Add near the top of the file (e.g., after the imports or near line 67 with
indexTypes):const DIRECT_UPLOAD_TYPES = new Set([ "application/pdf", "text/plain", "text/csv", "application/vnd.ms-excel", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "application/vnd.ms-excel.sheet.macroenabled.12", ]);Then remove lines 598-606 from inside
reader.onload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx` around lines 598 - 606, The DIRECT_UPLOAD_TYPES Set is currently created inside the reader.onload callback causing it to be recreated on every upload; move the constant to module scope by declaring DIRECT_UPLOAD_TYPES once near the top of ManageDocsModal.jsx (e.g., after imports or near the existing indexTypes constant) and remove the inline declaration inside the reader.onload handler so the reader.onload code references the module-level DIRECT_UPLOAD_TYPES instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx`:
- Around line 610-615: The error message in ManageDocsModal.jsx currently set
when ConfirmMultiDoc is false is inaccurate; update the setAlertDetails call
(inside the ConfirmMultiDoc check) to use an accurate message listing the
allowed types (e.g., "Only PDF, CSV, TXT, and Excel files are allowed") or a
clear generic message like "Unsupported file type; allowed: PDF, CSV, TXT,
Excel" so users see the correct allowed formats; ensure you update the string
passed to setAlertDetails({ type: "error", content: ... }) where ConfirmMultiDoc
is evaluated.
- Around line 608-621: Frontend permits CSV/TXT/Excel via DIRECT_UPLOAD_TYPES
but backend FileUploadIdeSerializer enforces
FileKey.FILE_UPLOAD_ALLOWED_EXT/MIME (OSS defaults to pdf), causing validation
failures; update the upload decision flow in ManageDocsModal.jsx to consult the
backend-supported upload types or the file_converter availability before
treating a file as "direct": either fetch allowed extensions/mimes from the
server (or read a passed-in prop/feature flag like isFileConverterEnabled) and
use that to compute DIRECT_UPLOAD_TYPES, or fall back to only allowing PDF as
direct upload; adjust the branch around DIRECT_UPLOAD_TYPES, setFileToUpload,
and resolve(file) so non-supported types show the ConfirmMultiDoc/error path
instead of resolving immediately.
---
Nitpick comments:
In `@frontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsx`:
- Around line 598-606: The DIRECT_UPLOAD_TYPES Set is currently created inside
the reader.onload callback causing it to be recreated on every upload; move the
constant to module scope by declaring DIRECT_UPLOAD_TYPES once near the top of
ManageDocsModal.jsx (e.g., after imports or near the existing indexTypes
constant) and remove the inline declaration inside the reader.onload handler so
the reader.onload code references the module-level DIRECT_UPLOAD_TYPES instead.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
backend/prompt_studio/prompt_studio_core_v2/views.pyfrontend/src/components/custom-tools/document-manager/DocumentManager.jsxfrontend/src/components/custom-tools/manage-docs-modal/ManageDocsModal.jsxfrontend/src/components/custom-tools/output-analyzer/OutputAnalyzerCard.jsx
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/prompt_studio/prompt_studio_core_v2/views.py
- frontend/src/components/custom-tools/document-manager/DocumentManager.jsx
…viewer - Use magic.from_buffer() for server-side MIME detection instead of browser-supplied content_type during file upload - Replace unhandled throw in FileReader.onerror with setFileErrMsg - Clear fileData state on fetch errors to prevent stale isContentAvailable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
frontend/src/components/custom-tools/document-manager/DocumentManager.jsx (1)
119-119: Remove temporary console logging from document fetch/render paths.These logs add noise and may expose document metadata in production browsers.
Suggested cleanup
- console.log("here--->", fileData); @@ - console.log(selectedDoc); @@ - console.log("response-->", viewType, mimeType); @@ - console.log("blob-->", blob); @@ - console.log(fileData);Also applies to: 167-167, 255-255, 261-261, 370-370
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/components/custom-tools/document-manager/DocumentManager.jsx` at line 119, Remove the temporary console.log calls that expose document data (e.g., the console.log("here--->", fileData) and other console.log usages noted around the fetch/render paths) in DocumentManager.jsx; locate the log statements that print fileData and any similar debug output and delete them (or replace with a guarded debug logger that only runs in non-production) so no document metadata is emitted to the browser console during normal operation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@frontend/src/components/custom-tools/document-manager/DocumentManager.jsx`:
- Line 119: Remove the temporary console.log calls that expose document data
(e.g., the console.log("here--->", fileData) and other console.log usages noted
around the fetch/render paths) in DocumentManager.jsx; locate the log statements
that print fileData and any similar debug output and delete them (or replace
with a guarded debug logger that only runs in non-production) so no document
metadata is emitted to the browser console during normal operation.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
backend/prompt_studio/prompt_studio_core_v2/views.pyfrontend/src/components/custom-tools/document-manager/DocumentManager.jsx
Frontend Lint Report (Biome)✅ All checks passed! No linting or formatting issues found. |
|
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|



What
Add support for CSV, TXT, and Excel files in Prompt Studio for enterprise users. These file types are stored as-is (no PDF conversion) and displayed in Raw View only.
Why
How
converted/subdirectory for previewfetch_contents_idechecksconverted/{name}.pdffirst for enterprise; falls back to original fileFiles Changed:
backend/prompt_studio/prompt_studio_core_v2/views.py- Upload stores original + converted; fetch resolves converted PDFbackend/utils/file_storage/helpers/prompt_studio_file_helper.py- Addconverted/dir,upload_converted_for_ide(), handle CSV/Excel MIMEfrontend/.../DocumentManager.jsx- Use mimeType for rendering, auto-switch to Raw Viewfrontend/.../ManageDocsModal.jsx- Allow CSV/TXT/Excel in upload validationCan this PR break any existing features. If yes, please list possible items. If no, please explain why.
No - Changes are enterprise-gated via
get_plugin("file_converter"). OSS behavior is unchanged (PDF-only). Existing DOCX/image/PPT uploads now store both original AND converted PDF, maintaining backward compatibility for preview while enabling original file indexing.Database Migrations
None
Env Config
None
Related Issues or PRs
Notes on Testing
file.docxin main dir +converted/file.pdfexists → PDF View shows PDF → Index uses original DOCXdata.csvin main dir, no converted file → Auto-switches to Raw View → Index uses CSVconverted/directoryChecklist
I have read and understood the Contribution Guidelines.